Saturday, June 06, 2009

STL Visualization

As of today, KDevelop can nicely display std::vector. I'll probably omit the obvious snapshot, and will point to a mailing list post with instructions for trying it. Instead, I'll tell the story of this feature.

For its entire history, GDB did not have any official way to display types from the C++ Standard Library in a sensible way. Several third-party scripts appeared, written in GDB's internal scripting language. However, they were fairly limited. You had to explicitly run those scripts, and all you got was text output without structure, making robust IDE integration impossible. Also, GDB's scripting language is itself unpleasant, and does not even have access to internal data structures and functions. It was clear that we need a way to write pretty-printers using real scripting language, with full access to GDB data structures, and proper integration with frontend interface.

The first prototype of Python-based pretty printing was written by myself during free hack slot at a CodeSourcery company meeting. It took maybe 4 hours, if not less, and could display std::string as string automatically. Some 4 hours more lead to the first public prototype. This version could automatically display std::vector as "[1,2]". The second prototype could finally display elements of std::vector as children, like one would expect in a variables tree of a frontend, and even report when new elements are added to the vector. However, this version took a couple of days of work, exposed a mere 4 functions from GDB to Python, and was a mess internally. It was clearly already outside the "quick hack" range.

Those prototypes would never turn into anything, were it not for Tom Tromey and Thiago Bauermann, who started a project to add complete Python scripting to GDB. This is much more ambitious than just pretty-printing. In particular, it includes defining new commands in Python, with full access to GDB internals. You can read more details in a post series by Tom.

Pretty-printing became a part of that large effort, and was greatly improved. One of the most notable change was incremental fetch of children. According to the C++ standard, an object does not exist until its constructor has exited. However, gcc debug info just lists all local variables in a block. A naive pretty-printer, when invoked on such a variable, would likely go into uncharted part of memory trying to fetch all children, and never return. To fix this, the Python pretty-printers were designed to use incremental fetch, using Python iterators, and GDB MI interface was also adjusted to be more incremental (yes, it's a trend). Beyond that, we've spend at least 3 weeks iterating on finer details. The GDB patch was finally checked in on Sep 15, and KDevelop4 patch shortly after.

This is still early implementation, and might have bugs, but now it's out for everybody to try.

Monday, June 01, 2009

Linking 101

Recently, I see more and more people having trouble with link-time errors—as if such an error is the worst kind of luck and cannot be fixed by mere mortals. There are many possible reasons, including Java as default language in universities, and alarming spread of header-only-philia, but that's for another post. Here, I want to give a simple diagnostic procedure for link-time errors.

Let's lay some groundwork first. If your job is programming in C++, you need to know what the -I and -L options do, and how they are different. Also, given a full path to a library file (with .a or .so or .lib extension), you should be able to link to that file—in two different ways. If you don't know any of the above already, all hope is lost—you might want to consider other occupations. Otherwise, let's look at the diagnosis steps for most common error—'undefined symbol'.

First, understand where the missing symbol is supposedly defined. Educated guess is usually fine. For example, a symbol named boost::system::foobar is most likely contained in the Boost.System library (and it's surprising how many folks fail to guess so). Then, find how you are supposed to link to that logical component, using documentation for the component or the corresponding Linux package. For example, you might decide to add -lboost_filesystem to the linker command line.

Second, make sure that used physical library file is the right one, and that the linker is not picking a different version of the library from a directory you don't expect. If you get error during linking of your application, you can use the -t flag for the GNU linker (or use -Wl,-t on gcc command line). This will print full paths for every library used, including those specified with the -lfoo syntax. For static linking, this will also tell which object files from the static libraries were used. If you get error when running the application, you one can use the LD_DEBUG environment variable. If you set that variable to help prior to running your program, you'll get a list of possible values. The most handy value in our case is files.

Third, if you seem to link to the right library, there are three further possibilities. First, maybe the library actually should not include the symbol. This can happen if you use wrong headers during the compilation, and can be debugged by passing the -save-temps option to gcc and checking the generated .ii file. Second, the symbol might be almost there—but slightly different—either using different calling convention (on Windows), or wchar_t mode (also on Windows) or a somewhat different types of parameters, or different namespace. In that case, you'll have to make sure the compiltation options of the application match library's requirements. Finally, it could be that the library actually lacks the symbol due to library bug, and you have to complain to maintainer. To distinguish those cases, you need to manually examine the list of library symbols. With gcc, the 'nm' command will do for static libraries, while 'readelf' can be used on shared libraries (Unix only). I don't know the best way on Windows, suggestions welcome.

That's it for the common case. Below, I list some relatively common specific problems. The list does not claim to be complete, so if you know some other cases, drop me a line.

Static linking. For static linking, the order of libraries on the command line matters, so if you don't see the linking grabbing the object file with your symbol, you might want to either reorder the libraries or use the --start-group option. See ld documentation for details and note that the performance cost of the --start-group option might not be a concern these days.

References to vtable. The GNU C++ compiler sometimes reports unresolved reference to 'vtable for SomeClass'. This generally is a pure way to say that the first non-inline method of SomeClass is not defined. See GCC FAQ

Windows DLLs. On Windows, if an application wants to use a function in DLL, then both DLL and the application should record this intention, using __declspec(dllexport) and __declspec(dllimport) pair. If either party does not do so, linker complains. With mingw, a typical error is undefined reference to `_imp___WHATEVER'. It means that the library is static, whereas the applications wants to use shared library.

Windows import libraries. On Windows, it's not possible to directly link to a DLL. Instead, an import library is created and used—typically by passing /IMPLIB option to the linker. If the linker does not report any errors, but does not produce import library either, it's a sure sign that you have not exported any function from the DLL, and have to check the logic that adds __declspec(dllexport)

64-bit compilation. When building 64-bit applications, you can get an error that say something about "relocation R_X86_64_32", and suggesting the -fPIC option. The issue here is that 64-bit applications should include only code compiled with -fPIC, and if you link against any static libraries, those libraries should also be compiled with -fPIC.